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Abstract — We present both offline and online maximum like- 
lihood estimation (MLE) techniques for inferring the static 
parameters of a multiple target tracking (MTT) model with 
linear Gaussian dynamics. We present the batch and online 
versions of the expectation-maximisation (EM) algorithm for 
short and long data sets respectively, and we show how Monte 
Carlo approximations of these methods can be implemented. 
Performance is assessed in numerical examples using simulated 
data for various scenarios and a comparison with a Bayesian 
estimation procedure is also provided. 



I. Introduction 

The multiple target tracking (MTT) problem concerns the 
analysis of data from multiple moving objects which are par- 
tially observed in noise to extract accurate motion trajectories. 
The MTT framework has been traditionally applied to solve 
surveillance problems but more recently there has been a surge 
of interest in Biological Signal Processing, e.g. see 113411 . 

The MTT framework is comprised of the following ingre- 
dients. A set of multiple independent targets moving in the 
surveillance region in a Markov fashion. The number of targets 
varies over time due to departure of existing targets (known 
as death) and the arrival of new targets (known as birth). 
The initial number of targets are unknown and the maximum 
number of targets present at any given time is unrestricted. At 
each time each target may generate an observation which is a 
noisy record of its state. Targets that do not generate observa- 
tions are said to be undetected at that time. Additionally, there 
may be spurious observations generated which are unrelated 
to targets (known as clutter). The observation set at each time 
is the collection of all target generated and false measurements 
recorded at that time, but without any information on the 
origin or association of the measurements. False measure- 
ments, unknown origin of recorded measurements, undetected 
targets and a time varying number of targets render the task 
of extracting the motion trajectory of the underlying targets 
from the observation record, which is known as tracking in 
the literature, a highly challenging problem. 

There is a large body of work on the development of 
algorithms for tracking multiple moving targets. These al- 
gorithms can be categorised by how they handle the data 
association (or unknown origin of recorded measurements) 
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problem. Among the main approaches are the Multiple Hy- 
pothesis Tracking (MHT) algorithm 112211 and the probabilistic 
MHT (PMHT) variant 026 L the joint probabilistic data asso- 
ciation filter (JPDAF) |U 2J], and the probability hypothesis 
density (PHD) filter M15L 12411 . With the advancement of Monte 
Carlo methodology, sequential Monte Carlo (SMC) (or particle 
filtering) and Markov chain Monte Carlo (MCMC) methods 
have been applied to the MTT problem, e.g. SMC and MCMC 
implementations of JPDA 11141 SMC implementations of 
the MHT and PMHT EE, and PHD filter JH |H B. 

Compared to the huge amount of work on developing track- 
ing algorithms, the problem of estimating the static parameters 
of the tracking model has been largely neglected, although 
it is rarely the case that these parameters are known. Some 
exceptions include the work of Storlie et al. 12511 where they 
extended the MHT algorithm to simultaneously estimate the 
parameters of the MTT model. A full Bayesian approach for 
estimating the model parameters using MCMC was presented 
in Yoon and Singh 13411 . Singh et al. 1230 presented an 
approximated maximum likelihood method derived by using 
a Poisson approximation for the posterior distribution of the 
hidden targets which is also central to the derivation of PHD 
filter in Mahler B15I1 . Additionally, versions of PHD and 
Cardinalised PHD (CPHD) filters that can learn the clutter 
rate and detection profile while filtering are proposed in flfill . 

In this paper, we present maximum likelihood estimation 
(MLE) algorithms to infer all the static parameters of the MTT 
model when the individual targets move according to a linear 
Gaussian state-space model and when the target generated 
observations are linear functions of the target state corrupted 
with additive Gaussian noise; we will henceforth call this 
a linear Gaussian MTT model. We maximise the likelihood 
function using the expectation-maximisation (EM) algorithm 
and we present both online and batch EM algorithms. For a 
linear Gaussian MTT model we are able to present the exact 
recursions for updating static parameter estimate. To the best 
of our knowledge, this is a novel development in the target 
tracking field. We stress though that these recursions are not 
obvious by virtue of the model being linear Gaussian. This 
is because the MTT model allows for false measurements, 
unknown origin of recorded measurements, undetected targets 
and a time varying number of targets with unknown birth 
and death times. To implement the proposed EM algorithms, 
an estimate of the posterior distribution of the hidden targets 
given the observations is required, and in the linear gaussian 
setting, the continuous values of the target states can be 
marginalised out. But, because the number of possible associ- 
ation of observations to targets grows very quickly with time, 
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we have to resort to approximation schemes that focus the 
computation in the expectation(E)-step of the EM algorithms 
on the most likely associations; that is, we approximate the 
E-step with a Monte Carlo method. For this we employ both 
SMC and MCMC which give rise to the following different 
MLE algorithms: 

« SMC -EM and MCMC-EM algorithms for offline estima- 
tion; and 

• SMC online EM for online estimation. 
We implement these three algorithms for simulated examples 
under various tracking scenarios and provide recommendations 
for the practitioner on which one is to be preferred. 

The EM algorithms we present in this paper can be imple- 
mented with any Monte-Carlo scheme for inferring the target 
states in MTT and reducing the errors in the approximation 
of the E-step can only be beneficial to the EM parameter 
estimates. We do not fully explore the use of the various Monte 
Carlo target tracking algorithms that have been proposed in 
the literature and instead focus on the following two. When 
using SMC to approximate the E-step, we compute the L- 
best assignments 111 811 as the sequential proposal scheme of 
the particle filter. This L-best assignments approached has 
appeared previously in the literature in the context of tracking, 
e.g. see Cox and Miller |6], Danchick and Newnam 

01, Ng 

et al. d. 

The MCMC algorithm we use for the E-step is the 
MCMC-DA algorithm proposed for target tracking in Oh et al. 
1 20]. For further assessment/comparison of the EM algorithms, 
we also implement a full Bayesian estimation approach which 
is essentially a Gibbs like sampler for estimating the static 
parameters that alternates between sampling the target states 
and static parameter. Note that the Bayesian approach is not 
novel and as it been proposed by Yoon and Singh H34fl . It is 
implemented in this work for the purpose of comparison with 
the MLE techniques. 

The remainder of the paper is organised as follows. In 
Section [II] we describe the MTT model and formulate the 
static parameter estimation problem. In Section [TTTJ we present 
the batch and online EM algorithms. Section [IV] contains 
the numerical examples and we conclude the paper with 
a discussion of our findings in Section [V] The Appendix 
contains further details on the derivation of the MTT EM 
algorithm, and details of the SMC and MCMC algorithms we 
use in this paper. 

A. Notation 

We introduce random variables (also sets and mappings) 
with capital letters such as X, Y, Z, X, A and denote their 
realisations by corresponding small case letters x, y, z, x, a. If 
a non-discrete random variable X has a density v[x), with all 
densities being defined w.r.t. the Lebesgue measure (denoted 
by dx), we write X ~ v(-) to make explicit the law of X. 
We use E#[-|-] for the (conditional) expectation operator; for 
jointly distributed random variables X, Y and Z and a function 
(x, z) — > f(x, z), Eg[f(X, Z)\Y — y] is the expectation of the 
random variable f(X, Z) w.r.t. the joint distribution of X, Z 
conditioned on Y = y. Eg [f(X, z)\y] is the expectation of the 
function x —> fix, z) for a fixed z given Y = y. 



II. Multiple target tracking model 

Consider first a single target tracking model where a moving 
object (or target) is observed when it traverses in a surveillance 
region. We define the target state and the noisy observation 
at time t to be the random variables X t G X C and 
Yt € y C R dy respectively. The statistical model most 
commonly used for the evolution of a target and its observa- 
tions {X t ,Y t }t>i is the hidden Markov model (HMM). In a 
HMM, it is assumed that {X t } t>1 is a hidden Markov process 
with initial and transition probability densities //^ and fy, 
respectively, and {Yt] t>1 is the observation process with the 
conditional observation density g^, i.e. 

Xi - M^(-), X t \(X 1:t -l = Xl-.t-l) ~ /v>(-K-i) 
Y t \ [{X, = Xi}^ , {Y 2 = yi} i7tt J ~ g^{-\x t ). 

Here the densities /xw,, and are parametrised by a real 
valued vector ip £ 9 C . In this paper, we consider a 
specific type of HMM, the Gaussian linear state-space model 
(GLSSM), which can be specified as 

fj^(x) = Af(x; fi b , E 6 ), U(x'\x) = Af(x'; Fx, W), 
g^(y\x) =Af(y;Gx,V). 

where Af(x; /i, E) denotes the probability density function 
for the multivariate normal distribution with mean fi and 
covariance E. In this case, ip = E&, F, G, W, V). 

In a MTT model, the state and the observation at each time 
(t > 1) are random finite sets, X t = (Xt,i,Xt,2, . . . ,X t> K*) 
and Yt — {Y t .i,Yt,2, ■ • ■ , Y tK v). Here each element of X f is 
the state of an individual target and elements of Y t are the 
distinct measurements of these targets at time t. The number 
of targets Kf under surveillance changes over time due to 
targets entering and leaving the surveillance region X. X t 
evolves to X t+ i as follows: with probability p s each target 
X t 'survives' and is displaced according to the state transition 
density in Q, otherwise it dies. The random deletion and 
Markov motion happens independently for all the elements of 
X f . In addition to the surviving targets, new targets are created. 
The number of new targets created per time follows a Poisson 
distribution with mean A& and each of their states is initiated 
independently according to the initial density /i^ in iff]). Now 
X t+ i is defined to be the superposition of the states of the 
surviving and evolved targets from time t and the newly born 
targets at time t+1. The elements of X t are observed through a 
process of random thinning and displacement: with probability 
Pd, each point of X t generates a noisy observation in the 
observation space y through the observation density g^ in (f2]l. 
This happens independently for each point of X f . In addition 
to these target generated observations, false measurements are 
also generated. The number of false measurements collected 
at each time follows a Poisson distribution with mean A / and 
their values are uniform over y. Y t is the superposition of 
observations originating from the detected targets and these 
false measurements. 

A series of random variables, which are essential for the 
statistical analysis to follow are now defined. Let C| be a 
Kt—i x 1 vector of l's and O's where l's indicate survivals 
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and O's indicate deaths of targets from time t — 1. For i 



■ K, 



t-i' 



1 i'fh target at time t — 1 survives to time t 
i'th target at time t — 1 does not survive to t 



The number of surviving targets at time t is Kf = 
We also define the Kf x 1 vector If containing 
the indices of surviving targets at time t, 



Note that If(i) will also denote the ancestor of target i from 
time t — 1, i.e. X t _i evolves to Xt j for i = 1, . . . , if t s . 
Denoting the number of 'births' at time n as A°, we have 
Kf = Kf + Kf. Note that according to these definitions, 
the surviving targets from time t — 1 are re-labeled as 
X t i , . . . , X t ki > an d the newly born targets are denoted as 

' ' * il 

Xt.K^+i, ■ ■ ■ , X t ,K? ■ Next, given Kf targets we define C t to 
be a Kf x 1 vector of l's and O's where l's indicate detections 
and O's indicate non-detections. For i = 1, . . . , i^f, 

^ J 1 i'th target at time t is detected at time t, 



cm 



i'th target at time t is not detected at time t. 



Therefore, the number of detected targets at time t is Kf = 
Y^iJi Similarly, we also define the Kf x 1 vector if 

showing the indices of the detected targets, 



k 

E 

3=1 



k: Y\Cf{j) = i 



i = \,.. 



,Kf 



lf(i) denotes the label of the z-th detected target at time t. 
So the detected targets at time t are X t jf^, . . . ,X t if( K dy 
Finally, defining the number of false measurements at time t 
as K{ , we have Kf = Kf + K( and the association from the 
detected targets to the observations can be represented by a 
one-to-one mapping 

A t :{l 7 ...,Kf}^{l,... 7 Kf} 

where at time t the i'th detected target is target lf(i) with 
state value X t and generates Y t J ^ t ^y We assume that 
A t is uniform over the set of all Kf\/K{\ possible one-to- 
one mappings. To summarise, we give the list of the random 
variables in the MTT model introduced in this section as well 
as a sample realisation of them in Figure Q] 

The main difficulty in an MTT problem is that in general 
we do not know birth-death times of targets, whether they 
are detected or not, and which observation point in Y t is 
associated to which detected target in X t . Let 

Z t = (c s t ,Cf,KlK{,A t 



be the collection of the just mentioned unknown random 
variables at time t, and 

6 = (ip, Ps ,p d ,\ b ,\f) £9 = $x[0,lfx [0,c») 2 



be the vector of the MTT model parameters. We can write the 
joint likelihood of all the random variables of the MTT model 
up to time n given 9 as 

:n? Xi :n , yi:n )pe(yi 
where 



(3) 



t=i V 

n I k t k t \ 

Pe(xi:„|«l:„) = Y{ II U( X tj\ X t-hilw) II ( 4 ) 

t=l \j=l J=fef+1 / 

n I f *t \ 

P»(yv. n \xl:n,Zl:n) =Y\ I^P^ Y[ 9*{Vt,a t (J) \ X t,ifU)) ^ 



( = 1 



3=1 



Here VO(k; A) denotes the probability mass function of the 
Poisson distribution with mean A, \y\ is the volume (w.r.t. 
the Lebesgue measure) of y and the term k(\/kf\ in (0 
corresponds to the law of A t . The marginal likelihood of the 
observation sequence yi : „ is 



»(yi:n) = ^8 N(yi:n|Xi : „, Z 1:n )} 



(6) 



The main aim of this paper is, given Yi :n = yi :rl , to estimate 
the static parameter 9* where we assume the data is generated 
by some true but unknown 9* G 0. Our main contribution is 
to present the EM algorithms, both batch and online versions, 
for computing the MLE of 9*: 

9ml = argmaxpe(yi : „). 

For comparison sake we also present the Bayesian estimate of 
9*. In the Bayesian approach, the static parameter is treated 
as random variable taking values 9 in with a probability 
density rj(9) and the aim is to evaluate the density of the 
posterior distribution of 9 given yi :n , i.e. 



p(9\yi:„) = 



V(0)Pe(yi:n) 



Yoon and Singh IB411 use MCMC to sample from p(9\yi :n ) 
which integrates both Metropolis-Hastings and Gibbs moves. 

III. EM ALGORITHMS FOR MTT 

In this section we present the batch and online EM al- 
gorithms for linear Gaussian MTT models. The notation is 
involved and we provide a list of the important variables used 
in the derivation of the EM algorithms in Table Q] at the end 
of the section. 



A. Batch EM for MTT 

Given Y 1:Jl = yi :n , the EM algorithm for maximising 
Pe(yi-.n) m © is given by the following iterative procedure: if 
9j is the estimate of the EM algorithm at the j'th iteration, then 
at iteration j + 1 the estimate is updated by first calculating the 
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Complete list of random variables of the MTT model 

-Xt,fc> YtX- fc'th target and fc'th observation at time t. 

X t = {Xi , . . . , Xk% }, = {it,i , . - . , Y t K v}: Sets of targets and observations at time t. 

Kf, K{ : Numbers of newborn targets and false measurements at time t 

Kf, Kf: Numbers of targets survived from time t — 1 to time t and detected at time t. 

Kf, Kf: Numbers of alive targets and observations at time t. Kf = Kf + Kf, Kf = Kf + K{ . 

Cf: Kf_i x 1 vector of O's and l's indicating surviving targets from time t — 1 to time t. 

Cf: Kf x 1 vector of O's and l's indicating detected targets at time t. 

If: Kf x 1 vector of labels of surviving targets from time t — 1 to time t. 

if: Kf x 1 vector of labels of detected targets at time t. 

At : {1, . . . , Kf} — > {1, . . . , Kf}: Association from detected targets to observations at time t. 




Fig. 1. Top: Complete list of the discrete random variables of the MTT model. Bottom: A realisation from MTT model: States of 
a targets are connected with arrows and with its observations when detected. Undetected targets highlighted with shadows, and false mea- 



surements are coloured grey. Cj.g 



([],[!, 1,1], [1,0, 1,1], [0,1,1], [1,1, 1,1]); 7f 



([1, 1, 0] , [0, 1, 1, 1] , [1, 1, 1] , [0, 1, 1, 0] , [1, 1, 1, 1]) 



K d 



(2, 3, 3, 2, 4); kL = (3, 0, 2, 1, 0), 



J l:5 



([1, 2] , [2, 3, 4] , [1, 2, 3] , [2, 3] , [1, 2, 3, 4]); Kf 



([],[!, 2, 3], [1,3, 4], [2, 3], [1,2, 3, 4]); C* s 



(0,3,3,2,4): 



K b 

rt l:5 



(3,1,0,2,0); 



([4, !],[!, 3, 2], [3, 5, 4], [1,2], [3, 2, 1,4]). 



following intermediate optimisation criterion, which is known SV, n (xi :n , Zi- n ) are: 
as the expectation (E) step, 



Q(0j,6) = E 0J [logpe(Xi : „,Zi : „,yi : „)|yi : „] 
\ Sj [logp e (Z 1:n ) +logpe(Ku n ,yv.n\Z 

'-6) [l0gPe(^l:n) 

-U s {log P6 (X.l:nj yi:n|^l:n)|yi:n) |yi: 



t=l k=l t=l k=l 

E 8j [lo ) + logp e (Xi : „,yi 

:n \Zl:n) |yi:n] \ ^ \ ^ j. \ ^ \ ^ r 



x t,kxT k , (8) 



t-2 k=l t=2 fe=l 

n k t n k t n k t 

J2J2 xt ^^( k ) x l^ xt ^ H 

t=2 k=l t=\ fe=fe|+l t=l fc=fef+l 



T 

x t,k x t,k 



The updated estimate is then computed in the maximisation 

(M) step These sufficient statistics are related to those used for es- 

timating the static parameters of a linear Gaussian single 
target tracking model, and this relation will be made more 
+i = argmax(2(#j, 6) explicit later. The rest of the sufficient statistics S%. n (zi- n ) to 

S'i5 ! „(zi : „) do not depend on Xi !n . 

[Ss, ni ■ ■ ■ 1 <Sl5,r!,] (Zl:n) 



3 " " see 



yt,at{k)yt,a t (k)' ^ti^ti^ti li ; K > 1 



k=l 



This procedure is repeated until 6j converges (or in practice 
ceases to change significantly). From equations ©-(O, it can 
be shown that the E-step at the j'th iteration reduces to — 2. 

calculating the expectations of fifteen sufficient statistics of * _1 
xi :ll , z 1:n and y 1:n denoted by S lyn , . . . , S 15 . n . (From now 
on, any dependancy on yi :Jl in these sufficient statistics and 

further variables arising from them will be omitted from the Let n denote the expectation of the ?n'th sufficient statistic 
notation for simplicity.) Sufficient statistics Sx >n (xi :n , zi- n ) to w.r.t. the law of the latent variables Xi : „ and Z\ :n conditional 



(9) 
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upon the observation yi :n for a given 9, i.e. 

e __ \ Ee [S m , n (Xl : „, Zi :n )\ yi :n ] 1 < w < 7, 
} E e [S m ,„ (Zi : „)| yi:„] 8 < m < 15. 



(10) 



Then the solution to the M-step is given by a known function 
A : { (Sf . n , . . . , Sf 5 n ) } -> such that at iteration j 



9 j+ i = argmaxQ^, 0) = A 5^„, . . . , 5 



The explicit expression of A depends on the parametrisation 
of the MTT model, in particular on the parametrisation of the 
matrices F,G,W,V, pb,^b as m the following example. 

Example 1. (The constant velocity model:) Each target has a 
position and velocity in the xy-plane and hence 

X t = [X t (l),X t (2),X t {Z),X t (A)f eX = R 2 x [0,oo) 2 , 

where X t (l),X t (2) are the x and y coordinates and 
Xt(3), A" t (4) are the velocities in x and y directions. Only 
a noisy measurement of the position of the target is available 

[Y t (l),Y t (2)}ey=[-K, K ] 2 . 

We assumed a bounded y and regard observations that are not 
recorded due to being outside this interval as also a missed 
detection. With reference to ©, the single target state-space 
model is 



CT 2 p^2x2 02x2 







2x2 C 6l) /2X2 



F= I J* 2 A / 2X2 ), G=( /2X2 2x2 ) 
U2x2 ^2x2 ' 



w = xp 



C™/2x2 02x2 



02x2 <J 2 , v l2x2 I 



, V = ail 



2x2 



Therefore, the parameter vector of this MTT model is 

9 = (Afc, Xf,Pd,Ps, Hbp, Hbv,Vbpi (J bvi< J tpi< J 'ivi< J y) ■ 

The update rule A for 9 at the M-step of the EM algorithm is 

fJ-bx — SQ. n (l)/Si 3n , flby = SQ. n (2)/S 13 ^ n , 

G lp = 2 S 13,n tr (( S 7,n - 2S lnl i b + S 13,n^bl^) M% ' M p ) 

= \ S 9 i 3 , n tr((S e 7 , n - 2Sl n £ + S? 8 ,nW*6 ) MjM v ) 
°l v = tr (Sl n M^M p - 2Sl n M p F p + S^F? F p ) /2S e 11>n , 
°L = ^ {Sl,MM v - 2Sl n M v F v + S^F? F v ) /2S 11<n , 
a 2 = tr (S s<n — 2GS^ n + GS l n G T ) /2S® n , 
Pd = Sg n /Si 0nl p s — Sfi ! „/S , f 2iri , 
Ab = Sf 3n /S® 5n , Xf — Si 4n /Sf 5n , 

where M p = [J 2x2 2X 2] , M v = [0 2X 2 ^2x2], and F p 
and F v are the upper and lower halves of F, that is F p (i,j) = 
F(i,j) and F v (i,j) = F{2 + for i = 1,2 and j = 
1,...,4. 



1 ) Estimation of sufficient statistics: It is easy to calculate 
the expectation of the sufficient statistics in (0 that do not 
depend on x 1:n . Noting that Z t is discrete, we simply calculate 
S m ,n(zi:n) for every zi :n with a positive mass w.r.t. to the 
density pe{zi-.n\yi:n) an d calculate the expectations as 



nU 
O,, 



Zl-.n 



S m ,n(zi:n)pe(zi:n\yi:n)- 



For those sufficient statistics in ([8]) that depend on Xi : „, con- 
sider the last expression in © with the following factorisation 
of the posterior 

:n? yX:n)P0 i^X'n |yi:n) ■ 

This factorisation suggests that we can write the required 
expectations as 

Sm,n = ^-8 [S'm,n(Xi : „, Zi- n )\ y\- n ] 

= Eg [Eg [5 m ,„(Xi :n , Zi :n )\ -Zi :n , y 1:n ] | y 1:n ] . (11) 

Let us define the integrand of the outer expectation in (fTTl i 
which is the conditional expectation 

S m ,n( z l:n) = Eg [S m ,nP^-l:n, Zl :n )\ Zl :n , Ylm] ■ 

as a matrix- valued function with domain Z n . Then, we can 
obtain Sf nn by calculating n (zi :n ) for every z lvn with a 
positive mass w.r.t. the density pe(zi : „|yi :n ) and then calculate 



S 



Zl:n 



St n {zi:n)pe{zi:n\yi:n)- 



The crucial point here is that it is possible to calculate 
Sj, „(zi : „) for any given z\ vn . In fact, the availability of 
this calculation is based on the following fact: conditional 
on {Z t } t>1 , {X t , Y t } 4>1 may be regarded as a collection of 
independent GLSSMs ( with different starting and ending times, 
possible missing observations) and observations which are not 
relevant to any of these GLSSMs. In the context of MTT, each 
GLSSM corresponds to a target and irrelevant observations 
correspond to false measurements. We defer details on how 
Sm n( z i--n) * s calculated to Section UlI-BI 

2) Stochastic versions of EM: For exact calculation of the 
E-step of the EM algorithm we need pg(zi- n \yi- n ) which is 
infeasible to calculate due to the huge cardinality of Z n . We 
thus resort to Monte Carlo approximations of Pe{ z i:n\yi:n) 
which we then use in the E-step; in literature this approach 
is generically known as the stochastic EM algorithm Q @, 
Bin i. We know from the previous sections that given Z\- n = 
zi- n the posterior distribution pe(xi :n |yi :n , z\- n ) is Gaussian 
and conditional expectations can be evaluated. Therefore, it is 
sufficient to have the Monte Carlo particle approximation for 
Pe{zi-. n \yi:n) only, which is expressed as 



/V 



PO 



(*l:„|yi:n) = £ wtfty (z Un ), J2 W n = L ( 12 > 



Then, the corresponding particle approximations for the ex- 
pectations of the sufficient statistics are 

8<m<15. 
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When changes with each EM iteration, the appropriate 
update scheme at iteration j involves a stochastic approxima- 
tion procedure where in the E-step one calculates a weighted 
average of Sfy „,..., Sm,n', the resulting algorithm is known 
as the stochastic approximation EM (SAEM) ]9[]. Specifically, 
let 7 = {■jj },->i> called the step-size sequence, be a positive 
decreasing sequence satisfying 



£7j 



3 



A common choice is jj = j a for 0.5 < a < 1. The SAEM 
algorithm is given in Algorithm Q] 

Algorithm 1. The SAEM algorithm for the MTT model 

Start with 6\ and S~®ln,„ = for m = 1, . . . , 15. For j = 1, 2, . . . 

• E-step: Calculate Sm,n for each m, and then calculate the 
weighted averages 



qU) _ (-i 

l -'7,m,n V 



7j 



S, 



(j-i) 



,n < h J rn,n ■ 



M-step Update the parameter estimate using A(- 



(13) 
as before 



AS. 



JO) 

7,1, n ' ' 



qU) 

' °-y,15,n 



In general, the Monte Carlo approximation pe (zi :n \yi:n) 
in (foi l is performed either sampling A samples from 
Pe • (zi:n|yi:n) using a MCMC method (in which case weights 
= 1/iV, i = 1, . . . , N) or using a SMC method with N 
particles. Depending on which method is used, we will call 
the resulting algorithm MCMC-EM or SMC-EM, respectively. 
For MCMC, we use the MCMC-DA algorithm of [20], but 
with some refinements of the MCMC proposals. (Details are 
available from the authors.) 

We use SMC to obtain the approximations 
{Pe( z i:t\yi:t)}i<:t<n sequentially as follows. Assume 
that we have the approximation at time t — 1 



Pe{zi;t-\\yi;t-i) 



N 

£* 



1<*M 0l:t-l)- 



To avoid weight degeneracy, at each time one can resample 
from pe(zi-.t-i\yi:t-i) to obtain a new collection of N 
particles and then proceed to the time t. Alternatively, this 
resampling operation can be done according to a criterion 
which measures the weight degeneracy (e.g. see Doucet et al. 
1 1 IIP. We define the N x 1 random mapping 



n t : {l,...,N}^{l,...,N} 

containing the indices of the resampled particles, i.e. Ht(i) = j 
if the i'th resampled particle is z-jf.^i- (If no resampling is 
performed at the end of time t — 1, then n t (i) = i for all 

(i) 

i.) Then, given y t and H t — Tr t , the particle z t 
sampled from a proposal distribution 



at time t is 



for i = 1, . . . , N. Therefore, zf 1 is connected to z\[ 



yut 



the i'th path particle at time t is z[ l ) t 
new weight is 



(i) jTtW) 
Z l:t-1 



t-1 and 

) and its 



,-,(t«W) 



Pe( z t l z t-i JPe(yt|yi:t-i^i:tJ 



(14) 



where, for i = 1, . . . , N, we take i/J^i = V-^ ^ resampling 
is performed and = ^t_i otherwise. 

Note that we also need to implement SMC for the online 
EM algorithm in order to obtain a Monte Carlo approximation 
of the E-step. Our SMC algorithm calculates the i-best linear 
assignments 111 80 as the sequential proposal; see Appendix IB] 
for details. 

B. Online EM for MTT 

We showed in the previous section how to implement the 
batch EM algorithm for MTT using Monte Carlo approxima- 
tions. However, the batch EM algorithm is computationally 
demanding when the data sequence y\- n is long since one 
iteration of the EM requires a complete browse of the data. 
In these situations, the online version of the EM algorithm 
which updates the parameter estimates as a new data record 
is received at each time can be a much cheaper alternative. 
In this section, we present a SMC online EM algorithm for 
linear Gaussian MTT models. 

An important observation at this point is that the sufficient 
statistics of interest for the EM algorithm have a certain 
additive form such that the difference of iS m)n (xi : „, z\- n ) and 
S m ,n-i{xi:n-i,zi: n -i) only depends on (x n _ 1 ,x n ,y ll ). This 
enables us to compute the required expectations in the E-step 
of the EM algorithm effectively in an online manner. We shall 
see in this section that, with a fixed amount of computation and 
memory per time, it is possible to update from S® n t _ 1 (zi :f _i) 
to Sf n t(zi-.t) given y t and zt at time t. To show how to handle 
the sufficient statistics in (0 for the MTT model, we first start 
with a single GLSSM and then extend the idea to the MTT 
case by showing the relation between the sufficient statistics 
in a single GLSSM and in the MTT model. 

1) Online smoothing in a single GLSSM: Consider the 
HMM {X t ,Y t } t>1 defined in dTJ. It is possible to evaluate 
expectations of additive functionals of X\- n of the form 

n 

S n (xi:n) = s(xi) + s(x t -l,X t ) 
t=2 

(with possible dependancy on yx :n also allowed) w.r.t. the 
posterior density pe(xv.n\yi-.n) in an online manner using only 
the filtering densities {pe(xt\yi:t)}i<t<n- The technique is 
based on the following recursion on the intermediate function 



T t e (x t ) :=E e [S t (X 1:t )\X t 
=E 9 [2ti(*t-i) 



■ s(X t -.i,x t )\ yut-i,%t] (15) 



,yi-.t) 



with the initial condition Tf{xi) = s(xi). Note that the ex- 
pectation required for the recursion is w.r.t. the backward tran- 
sition density pe(xt-i\yi:t-i)Xt). The required expectation 
Eg [S n (Xi :n )\yi :n ] can then be calculated as the expectation 
of the intermediate function T%(x n ) w.r.t. the filtering density 
Pe(x n \y 1:n ), that is, 

[S n (X 1:n )\ y 1:n ] = E e [T 9 n {X n )\ y 1:n ] . 

Consider now the GLSSM that is defined in (O, where, 
additionally, Y t is possibly missing/undetected and Cf is the 
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indicator of detection at time t. It is well known that, given 
{(Y t , Cf) — (yt, cf)}t>i, the prediction and filtering densities 
Po{xt\yi:t-i,<4-.t-i) and Pe(xt\yi:t,cf. t ) are Gaussians with 
means (fi*t\t-i, Ht\t) an d covariances (S 4 i t _i, S 4 i t J and are 
updated sequentially as follows: 



(Mt|t-i) s *|t-i) — Fn t ^ 1 \t-i,FT, t u_ 1 F 



W, 



(16) 



i + ^t\t-iG T T t 1 e t , 



St|t-iG r t GE t | t _i) , 



(17) 



. (A*t|i-1; ^t|i- 



0. 



such that S m ,i( x i-i,cf.i,yia) for 
following form: 

l 

X c t xtx t 



1 , . . . , 7 are in the 



t—i 
i 

E 



T 



t=i 

t=2 



t=2 



T 

X\ ■ X x • 



(18) 



(so, c?2 = d y and e?6 = 1> else d m = d x )- These functions 
are actually the sufficient statistics in the MTT model corre- 
sponding to a single target. Then it is possible to define the 
incremental functions 



s m : (XUX 2 ) x {0,1} x y 



>d r x d r , 



(19) 



where s m 's are defined such that for m = 1, . . . , 7 



S m ,i( x i-i,cf.i,yid) = s m (xi,cf, y%)+^2 s m(xt-i,x t ,Ct, y t ). 



4=2 



For example, si(xi, , j/i) = cfx^J, s 3 (x 1 ,cf,y 1 ) = 
Od x xd x , s 5 (x t _i,xt,cf,y t ) = xt-ixf, s 6 (xi,cf,yi) = xi, 
S7(xt-i,Xt,cf,yt) = Od x xd x , etc. We observe that each suffi- 
cient statistic is a matrix valued quantity, hence its expectation 
can be calculated using forward smoothing by treating each 
element of the matrix separately. For example, for 



C t XtX t , 



t=l 

we perform forward smoothing for each 

n 

Si, n ,ij(xi :n ,cf. n ,y 1:n ) = ^2c$xt{i)x t {j), i,j 
t=i 



..,d x . 



It was shown in Elliott and Krishnamurthy [12] that, the 
intermediate function 

T° itij (x t ,4 :t ) ■= U [S 1 ,t, ij (X 1:t ,c d 1:t ,y 1:t )\ 4 f ,x t ,y ltt ] 



xfPi tt ,ijX t I '/,'.,.,, 



for the i,j'th element is a quadratic in x t : 

(20) 

where Pi,t,ij is a d x x d x matrix, qxjjj is a d x x 1 vector, and 
ri t y is a scalar. Online smoothing is then performed via the 
following recursion over the variables Pi,t,ij, Qi,t,ij, ^i,t,ij- 

Pi,t+x,ij = B t P\j t ijBf + cf +1 eiej , 
Ql,t+i,ij = Bjqi ttit j + Bj (Pi,t,ij + Pi,t,ij) &tj 
r\,t+\,%j = ri,t,ij +tr (Pi^E, 



where r t = GE t | t _ 1 G T + V and e t = yt — Gfj, t u_i. Also, 
letting B t = T, t \ t F T (FT, t \tF T + W)~\ b t = (I dxX d x - 
B t F)p. t \ t , and £ t |t+i = (Id x xd x - S t F)St| t we can show 
that the backward transition density required for the forward 
smoothing recursion (TT3T > is Gaussian as well 

pe(x t -i\yut-i,ci. t _i,x t ) = M {x t -\\ B t -ix t + &t-i) £t-i|t) 
We define the matrix valued functions 

S m j : X 1 x {0,1} 1 xy l -> R d * xd ™, 



where ei is the i'th column of the identity matrix of the size d x , 
and tr(A) is the trace of the matrix A. For the initial value of 

Tl t x,ij{xi,cf), Pl,l,ij = c l e i e J) = 0^x1,^1, i ti j = 0. 

Therefore, the i,j'fh element of the required expectation at 
time n can be calculated as 



2/1: 



i) c l:n] = 
T 



-T 



We can similarly obtain the recursions for the other sufficient 
statistics in terms of variables P m .tM< Qm.t.n, Trn..t.M for the 
m'th sufficient statistic (see Appendix lAl 1 1211 . 

Remark 1. Note that P\,t, 3 % = (Pi,t,ij) T 
(similarly for Qi,t,ij) an d therefore need only be 
calculated for j > i. Note that the variables 

Mt|t> ^t(t) Tt, £(, Bt, bt, Y, t \t+1, Pm,t,ij, Qm,t,ij, ^rn,t,tj 

obviously depend on cf. t , yi-t and 9, but we made this 
dependancy implicit in our notation for simplicity. We will 
carry on with this simplification in the rest of the paper. 

2) Application to MTT: We showed above how to calculate 
expectations of the required sufficient for a single GLSSM. 
We can extend that idea to the scenario in the MTT case, 
where there may be multiple GLSSMs at a time, with different 
starting and ending times and possible missing observations. 
Recall that at time t the targets which are alive are the fc| 
surviving targets from t — 1 and the k\ newly born targets at 
time t, so the number of targets is kf = fef + k\ . For each 
alive target, we can calculate the moments of the prediction 
density pe(xt,k\yv.t-u z ui) for the state 



(/^It-i.fci ^t|t-i,fe) = 



{Ff-t-l\t-l,i s t (k)j 

(/ifc,Sf,) , 



i{k) FT 



w) ,k - k&t ' 

fef < fc < kf 



Recall that if(k) appears above due to the relabelling of 
surviving targets from time t — 1. Also, given the detection 
vector cf and the association vector a t , we calculate the 
moments of the filtering density pe(^t,fc|yi:i, zi-.t) f° r the 
targets using the prediction moments 

i IH\t—i,k + ^t\t-i,kG T T t let,k, 



J t\t-l.k 



- E *l 



t-l,k 



G T rr,iGE 



t,k^ n t\t-i,k 



where T t<k = GS t | 4 _ 1 , fc G T + V and t t ,k 

^k 



cm 

cf(k) 



0. 



yt,a t (i' t (k)) ~ 

G/H\t-i,ki where i' t (k) = Ylj=i c tti)- Note that if me ^' th 
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alive target at time t is detected, it will be the i£(fc)'th detected 
target, which explains i' t (k) in t t .k- In a similar manner, we 
calculate B t>k , b t , k , and E t | t+1)jfc using fj, t \ tik and S t | t)fc for 
k = 1, . . . , kf in analogy with B t , b t , and £ t | t+1 . 

In the following, we will present the rules for one-step 
update of the expectations 

:ni Zl:n)\ Yl-.m z \:n\ 

of the sufficient statistics S , rj ^ n (xi :n , Zi :n ) that are defined in 
©. Observe that we can write for 1 < m < 7, 

n 

(Xl :Il ,Zl :n ) = S m (xi,Zl) + ^ S m (Xt-l, X t) 2T t ), (21) 

t=2 

where the functions s m can be written in terms of s m 's (fT9] l 
as follows: 



s m (xi,zi) = E s m (xx,k, cf(k), yi, ai (^ (fc))), 
fe=i 

Sm(Xt-l,X(,Z() = s m (x t _ lti s( k ),x ttk ,cf(k),y t!at (i> t ( k ))) 



+ E S m(xt,k,(4(k),yt,at(i' t (k)))- 

where, again, i' ( (fc) = EjLi c i (.?')• (Notice that if cf(fc) = 
this i' t (k) can still be used as a convention; since the choice 
of the observation point in y t is irrelevant as it will have no 
contribution being multiplied by cf (fc).) Therefore, the forward 
smoothing recursion for those sufficient statistics in dS) at time 
t 

+S m (X t _l, X t , Z t ) |X(, yi :t _l, Zl :t _l] 

can be handled once we have the forward smoothing recursion 
rules for the sufficient statistics in ( fT~8T >. For k = 1, . . . , kf , let 
t fe denote the forward smoothing recursion function for 
the m'th sufficient statistic for fc'th alive target at time t. For 
the surviving targets, fc'th target at time t is a continuation 
of the i|(fc)'the target at time t — 1. Therefore, we have the 
recursion update for Tr, t fe for 1 < fc < fc| as 

+Sm(^t-l,-i|(fe))^,fe, c ?( fc ); 2/a t (iJ(fc)))| 5Ct,k) yi:t-l, «l:t-l 

For the targets born at time t (for fc| + 1 < fc < fcf 
), the recursion function is initiated as t k (xt, k , Z\-t) — 
s m {xt,k, cf(k)). Therefore, the (i,j)'th component of the 
recursion function can be written as 

T m ,t,k,ij( X t,k, Zl : t) = xJkPm,t,kajXt,k+qm,t,k,ij X t,k+r m ,t,k,ij 

similarly to the single GLSSM case, where this time we have 
the additional subscript fc. For surviving targets the recur- 
sion variables Pm,t,k,ij^9m,t,k,ij,r m> t,k,ij for each m,i,j are 
updated from P m ,t-i,i|(fc),y , Q m ,t-i,q(k),iji ^m,t-\,%% (fc),*j. by 
using /i{_i|t_i,jj>(fc), ^t-i\t-i,if(k)> 5t-i,if(fc)' h-i,q(k)> 
E t -i|t,i f (fe), cf(fc) and, y t , at (i' t {k)) with = EjLi 4 0')- 



For the targets born at time £ (for kf + 1 < fc < kf ), the 
variables are set to their initial values in the same way as in 
Section |IIFBT] using cf(k) and, if cf(fc) = 1, y t , at (i' t {k))- The 
conditional expectations of sufficient statistics 

£m,t(2l:t) = E 9 [ T m,t (X t , y 1:4 , Z 1:t ] 

can then be calculated by using the forward recursion variables 
and the filtering moments. Let 



m,t,k 



(zut) = te[T° k (X tik ,z 1: t)\yi:t,z 1:t \ 



denote the expectation of the m'th sufficient statistic for the 
fc'th alive target at time t, where its (z,j)'th component is 

^m,t,k,ij — tr [Pm,t,k,ij yfJ-tlt.k^tM + ^t\t,k 

T 

+ Qm,t,k,ij^t\t,k + f mi t,k,ij- 

Then, the required conditional expectation for the m'th suffi- 
cient statistic can be written as the sum of two quantities 



■Sm,i( Z l:t) — Salive,m,t( Z l:t) + S d ead.rn,t {z\-t ) ■ 



(22) 



where the quantities are respectively the contributions of the 
alive targets at time t and dead targets up to time t to the 
conditional expectation S® t t {zi-t) 



3 alive,m,t 



S" 



dead,m,t 



(Zl;t 



{Zl:t) 



E 

fc=i 
t 



qti 



(Zl:t), 



E E 

J = l fc:c=(fe)=0 



^m,j— i,fc( z iy-i) 



(23) 



As d22"i i shows, we also need to calculate S e dead t (zi : t) at 
each time and by ( f23l this can easily be done by storing 



eod m at ti me £ — 1 and using the recursion 

SZ, t_ ljfc (2l:t-l) 



fe:c|(fc)=0 

where the terms in the sum correspond to targets that terminate 
at time t — 1. 

Finally, the sufficient statistics Sg iT j(^i :T j), . . . , S , i5, n (^i:n) 
can be calculated online since we can write for each m = 
8,..., 15 



Sm,n( z l:n) — ^ ' s m j z t) 



for some suitable functions s m which can easily be constructed 
from (0. Hence they can be updated online as 



Sm,t(zi:t) — S m ,t-l(Zl:t-l) + S m (zt). 



(24) 



We now present Algorithm [2] to show how these one-step 
update rules for the sufficient statistics in the MTT model can 
be implemented. For simplicity of the presentation, we will use 
a short hand notation for representing the forward recursion 
variables in a batch way. Let 7^ jt (2i:t) = (T^ l t k (zi-t),k = 
1, . . . , kf) where 

Xm,t,k( z l--t) = (Pm,t,k,ij^Qrn,t,k,ij,rm,t,k,ij ■ a U hj) 

denote all the variables required for the forward smoothing 
recursion for the m'th sufficient statistic for the fc'th alive 
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target at time t. We can now present the algorithm using this 
notation. 



Algorithm 2. One step update for sufficient statistics in the MTT 
model _ 

We have 7^ lt _i(zi:t-i)» Sa ea d,m,t-i{ z i:t~i), m = 1, . . . ,7, 
^m' t-i( z i:*-i)' rn' = 8, ... ,15 at time t — 1. Given z t and yt, 
- Set i x = 0, i d = 0, Sf toe)mjt (2!i:t) = and 5| eadjmjt (^i :t ) = 

Sdead,m,t-l{zi:t~l) for 171 = 1, . . . ,7. 

-/ ri = l,...,fcf_ 1 + fc| 

• if « < fcf-i and cf (*) — 1» ( f ne i'th target at time t—1 survives), 
or if i > fcf_i, fa new target is born), set i x — i x + 1. 



- /n case of survival, use /it-i|t-i.i and St_i|t_i,j to obtain 
the prediction moments Htu-i.ix and ^t\t-\,i x - I n case °f 
birth, set the prediction distribution p>t\t-i,i x — an d 
St|t_x,i = Sfc. 

* If cf(i x ) = 1, i^'di target is detected: id = id + 1. 
t/se fit\t-i,i x and E t | t _ 1]iaJ and Vt, at ^ d ) to update the 
filtering moments fM\t,i x and E t | t ix . 

* 7f cf(i x ) — 0, ia;'di target is not detected: Set 
(fJ>t\t,i x i ^t\t,i x ) = \lH\t-l,i x i E t | t _i, ix ). 

- For m = 1, . . . , 7 

* /n case o/ survival, update the recursion vari- 
ables T m ,t,i x ( z l:t) u M n 8 7m,t-l,i(Zl:t-l), Mt-l|t-l,i> 

Et-i|t-i,j> b t -i,i, B t -i,i, Et_!| tii , c t (ia;) and 
Vt,a t (i d ) if c${ix) = 1- ^n case o/ birth, initiate 
Tm,t,i x {zi :t ) using cf(i x ) and y t ,a t (i d ) if cf(i x ) = 1. 

* (optional) Calculate S 1 ™,*,^ (zi:i) 

lH\t,i x and E t | M ^ and update Sa iit , e , m> t(>i:t) <- 

Salive.m.t i z l:t) + S m ^ j j J (fl:t)- 

if i < kt-\ and c|(i) = 0, die i'fn target at time t—1 ;s dead. 
For m = 1, . . . , 7, 

- Calculate S m ,t-i,i{ z i--t- 
Ht-i\t-i,i and E t _ 1 | t _ lj j. 

- f/pdato S.ie.^^t^irt) r- >'." -.,„ ,Uh) - 



from T m ,t~i,i(zi-.t 

~ Sdead,m,ti z ~*-:t) 



- (optional) Update Sm,t(zi-. 
for m = 1, . . . ,7. 

- Update S m ,t(zi:t) = Sm,t-i(zi:t-i) + s 



*5 'alive, m,t (*l:t) + S*2 

(zt) for m = 8, . . . , 15. 



Notice that the lines of the algorithm labeled as "optional" 
are not necessary for the recursion and need not to be per- 
formed at every time step. For example, we can use Algorithm 
|2] in a batch EM to save memory, in that case we perform 
these steps only at the last time step n to obtain the required 
expectations. Notice also that we included the update rule for 
the sufficient statistics in (O for completeness. 

3) Online EM implementation: In order to develop an 
online EM algorithm, we exploit the availability of calculating 
Sf t , . . . , i5| t and Ss,t, ■ ■ ■ , 5*i5,t in an online manner as shown 
in Section ITII-B2I In online EM, running averages of sufficient 
statistics are calculated and then used to update the estimate 
of 6* at each time |0, 0, EH 0- Let 6 i be the initial guess 
of 9* before having made any observations and at time t, 
let 0\..t be the sequence of parameter estimates of the online 
EM algorithm computed sequentially based on yi ; t_i. When 
y t is received, we first update the posterior density to have 
P6i t ( z i--t\yi:t)i an d compute for 1 < m < 7 

T ^m,t (*t, *l:t) = E 1:t (1 - 7t) T 7,m7t-l ( X *-l i *l:t-l) 

+ 7*s m (X f _i,x t ,z t ) x t ,yi :f _i,zi :t _i (25) 



for the values Z\ v , 



for i = 1 , . . . , N, where we have 



the same constraints on the step-size sequence {7t} f>1 as 
in the SAEM algorithm. This modification reflects on the 
updates rules for the variables in T® n t . To illustrate the change 
in the recursions with an example, the recursion rules for 
the variables for Sx,t(xi-.ti c i-.t) f° r tne simple GLSSM case 
become (see Appendix lAl> 

P~f,i,t+i,ij = (1 - jt+i)B?P 7t i >t ,ijB t + 7t +1 Ct +1 e i eJ 
q~/,i,t+i,ij = (1 - lt+i)(Bjq lt i, t ,ii 



Bj (Pj,l,t,ij + P~A 



hi 



r~/,i,t+i,ij = (1 ~ 7t+i)( r 7,i,*,ii +tt(P 



So this time we have t( z i-.t) 
1, . . . , kf) where 



'^7,m,i,/c('^l : ^) (yPf ,m,t,k,ij i Q'y ,m,t,k,ij ,m,t,k,ij • ^ j) • 

and the conditional expectations 



can be calculated by using 72 ^* t k (zi : t) as in Section HII-B2I 
Finally, regarding those S m .t in ©, we calculate 8 < m < 15. 

S 1<m ,t (Zl:t) = (1 - 7t) 5 '7,m,t-l (Zl-.t-l) + 7t s m (z t ) ■ (26) 



U) 

for the values z\-.t = z\{ t for i = 1, 
tion step, we update 6 t+ i = A (sf^fj, Sfy A 
expectations are obtained 



N. In the maximisa- 
where the 



,(») qS 



S. 



EiIi^ W ^7,m,t(^S), 8<m<15 



In practice, the maximisation step is not executed until a burn- 
in time tb for added stability of the estimators (e.g. see Cappe 
A). 

Notice that the SMC online EM algorithm can be imple- 



2] the only changes are 
Algorithm [3] describes 



mented with the help of Algorithm 
d25l and (|26]l instead of (g^ and (f24i 
the SMC online EM algorithm for the MTT model. 

Algorithm 3. The SMC online EM algorithm for the MTT model 
• E-step: If t — 1, start with 0\, obtain ps 1 (zi\yi) = 
Sili w i & W ( 2 i)> and for i = 1, 



, TV initialise 



v^a^i s;: dead>m> M i >) for m 

S' 7 ,m',i(2i' ) ) far m' = 8, 

lft>X 

Obtain p Bl:t {z 1:t \yi : t) 



15, 



1, . . . , 7 and 



i=1 wl '5 w (zi 



from 



pe l!t _ I (2i:t-i[yitt-i) a/ong w/fn 7r t . 

For i = 1, . . . , N, set j = TVt(i). Use Algorithm [2] with the 
stochastic approximation to obtain 



V%,t(*&)- S MmMvi) for m = 1 

m i t{z\\) f or m' — 8, ... ,15 from 
^I^^li^^aW-i^l!) M m = 1,...,7 
and S^m'.t-iCZift-i) /or m' = 8, . . . , 15. 
M-step: If t < t b , Ot+i = flt. F/se, /or i_= 1,...,7V, 
m = 1, .... 7 ca/cMtoe 5^ e , mit (z«) and S**, t (*$) = 



, 7 and 
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TABLE I 

THE LIST OF THE EM VARIABLES USED IN SECTIOnHTTI 



< 1 a t iim , m , t (4:l) + ^^ ad , m , t (^:]) ('optional' lines in Algo- 
rithm^. Calculate the expectations 

q^i-.t q^i.t I 

°7,l,t! ■ • ' ' J j,15,t 

N 

= y y W ^ [^7,m,t: • • • ) ^-y^t) ^T.S.tj • • • , S7,15,tj ( z l:tJ • 

and update t+ i = A \ S^ t , . . . , S^i 5jt ). 

Finally, before ending this section, we list in Table Q] 
some important variables used to describe the EM algorithms 
throughout the section. 

IV. Experiments and results 

We compare the performance of the parameter estimation 
methods described in Section [ill] for the constant velocity 
model in Example Q] where the parameter vector is 

9 = (A b , Xf,p d ,p s , fi bp , ^i bv ,a bp ,a bv7 al p7 al v ,a y ) . 

Note that the constant velocity model assumes the position 
noise variance er^ = 0. All other parameters are estimated. 

A. Batch setting 



1 ) Comparison of methods for batch estimation: We run 
two experiments using the constant velocity model in the batch 
setting. In the first experiment, we generate an observation 
sequence of length n = 100 by using the parameter value 

9* = (0.2, 10, 0.90, 0.95, 0, 0, 25, 4, 0, 0.0625, 4) 

and window size k — 100. This particular value of 9* creates 
on average 1 target every 5 time steps, and the average life of 
a target is 20 time steps. Therefore we expect to see around 
4 targets per time. 

Using the generated data set, we compare the performance 
of the three different methods for batch estimation, which are 
SMC-EM and MCMC-EM (two different implementations of 
SAEM in Algorithm[Qi for MLE, and MCMC for the Bayesian 
estimation Q. For SMC-EM, we used N = 200 particles 
to implement the SMC method based on the L-best linear 
assignment to sample associations, where we set L = 10, 
the details of the SMC method are in Appendix [B] For the 
MCMC-EM, in each EM iteration we ran 5 MCMC steps and 
the last sample is taken to compute the sufficient statistics, 
i.e. N = 1. For both the SMC and MCMC implementations 
of SAEM, jj = j~°' s is used as the sequence of step-sizes 
for all parameters to be estimated, with the exception that 
jj — j~°- 55 is used for estimating a^. v . That is to say, in 
the SAEM algorithm, Sfy <n , Sfy n , and S^ n are calculated 
using 7j = j~ ' 55 , and S^\ x n is calculated twice by using 
jj = j~°- 55 and 7j = j~ '^ separately (since it appears both 
in the estimation of a^ v and p s ), and for the rest of S^m,™ 
7j = is used. For Bayesian estimation, the following 

conjugate priors are used: 

Ps , Pd ~ Unif (0, 1), X b ,X f ~ 5(0.001, 1000), 
o* v , <J% a 2 bp , a 2 bv ~ J£(0.001, 0.001), 

Vbx\o~ bp ~ jV(0.1, lOOOOfcp), l^byWlp ~Jv r (-0.1,1000cr^). 

Figure |2] shows the results obtained using SMC-EM, MCMC- 
EM and MCMC after 2000, 3 x 10 5 , 3 x 10 5 iterations 
respectively. For the Bayesian estimate, we consider only the 
last 5000 samples generated using MCMC as samples from the 
true posterior p{9\y\- n )- For comparison, we also execute the 
EM algorithm with the true data association and the resulting 
9* estimate will serve as the benchmark. Note that given the 
true association, the EM can be executed without the need for 
any Monte Carlo approximation, and it gave the estimate 

9*' z = (0.18, 9.94, 0.92, 0.97, -1.98, 0.91, 17.18, 5.92, 

0,0.027,4.01). 

The z in the superscript is to indicate that this value of 9 
maximises the joint probability density of yim and zi- n , i.e. 

9*' z = argmaxlogp(,(yi : „,zi : „) 

which is different than 9ml- However, for a data size of 100, 
9*' 2 is expected to be closer to 9ml than 9* is, hence it is 
useful for evaluating the performances of the stochastic EM 
algorithms we present. From Figure |2] we can see that almost 
all MLE estimates obtained using SMC-EM and MCMC-EM 
converge to values around 9*' z , except for a 2 v from SMC-EM 



Sections IIII-AI and IIII-ATl 

Sm,n, rn = X : 15, Sufficient statistics of the MTT model 

Sf^ n , m = 1 : 15, Expectation of S mn conditional to yi :n 

S!^ n , rn = 1 : 7, Expectation of S m ,n conditional to yi :n and z\ :n 

Section BiTaII 

Sf^ n , Monte Carlo estimation of n 

S^f t m,n ' Weigh ted average of Sm ;n , . . . , Sm,n for the SAEM algorithm 

section nnmi 

Sm,n, m = 1 : 7, Sufficient statistics of a single GLSSM 
5m, t, m = 1 : 7, Incremental functions for S m ,n 
S m ,n,ij, The (i, j)'fh element of S m ,n 
Sjn,t,ij, The (i,i)'th element of s m , t 

Tm,t,ij , Forward smoothing recursion (FSR) function for SVn.t.ij 



Section HIFB2T 



Sm,t, rn = 1 : 15, Incremental functions for S m ,n 
1* m = 1 : 7, FSR function for S m ,t 

t fc' FSR function for m'th sufficient statistic of the fc'th alive target 
at time t 

T e t , - ., The (i, j)fh element of T , , 
P m ,t,k,ij,qm,t,k,ij,r mt t,k,ij, Variables to write T mitjMj 
^rn t k Expectation of the m'th sufficient statistic of the fc'th alive target 
at time t 

lL,t,k,ir The fti)'* element of s e m,t,k 

5®, ,, Contributions of the alive targets at time t to S® + 

CI l IV C- » Tfi , t ^ f f i , L 

S® ead t , Contributions of the dead targets up to time t to S 1 ^ t 
Section ImHOl 

^-y.m t' Online estimation of t using 6\-.t 

^%,m,t,fc,ij j Q"f,m,t,k,ij 5 ^* , y,m,t,fc,ij • Valuables to write T^^m,t,k,ij 
S • t , Online estimation of 5 e , . using S1.4 

S H* , t , Online estimation of S , , , using 0i-t 
&y*m,t> Online estimation of t using 1:t 
S-y : m,t> rn = 8 ; 15, Online calculation of Sm,n 

using 8\-t 
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lower x-axis: number of SMC-EM iterations, upper x-axis: number of MCMC-EM iterations ( xlO' 1 ) 



Fig. 2. Batch estimates obtained using the SMC-EM (thin lines) and MCMC-EM (bold lines) algorithms for MLE and MCMC algorithm for the Bayesian 
estimate (histograms). 9*' z is shown as a cross. Upper and lower x-axes show the number of EM iterations for MCMC-EM and SMC-EM, respectively. 



has not converged within the experiment running time. The 
histogram of the Bayesian MCMC samples in Fig [2] indicate 
that the modes of the posterior probabilities obtained using 
MCMC are around 9* > z as well. 

The computational complexity of one MCMC move for 
updating Z\.. n , for a fixed parameter 9, is dominated by a 
term which is 0(X x T 2 Xb ), where X x = Ah/(1 — p s ) is the 
average number of targets per time. On the other hand, the 
cost of the E-step of SMC-EM is dominated by a term which 
is <D(TNLX y ), where X y = X x (l + p d ) + X f and L is the 
parameter used in L-best assignment. (For a more detailed 
computational analysis for SMC based EM algorithms see 
Appendix [C]) In realistic scenarios, one expects the SMC E- 
step, being power three in the number of targets and clutter, 
to be far more costly then the MCMC E-step, which results in 
the SMC-EM algorithm being far slower, as in our example. 
We observed, but not shown in Figure [2] that the 9 samples 
of the MCMC Bayesian estimate reached the true values after 
approximately 2e4 iterations, earlier than MCMC-EM's 7.5e4 
iterations. This is because MCMC-EM forgets its past more 
slowly than MCMC Bayesian due to dependance induced by 
the stochastic approximation step (TT~3T > - Although in this case 
MCMC Bayesian seems preferable, we need to be careful 
when choosing the prior distribution for 9 especially when 
data is scarce as it may unduly influence the results. 

The reason why SMC-EM is comparatively slow to con- 
verge is because of the costly SMC E-step. Often, the pa- 
rameters can be updated without a complete browse through 
all the data. We may thus speed up convergence by applying 
SMC online EM (Algorithm [3} on the following sequence of 
concatenated data 

[yi:n, yirnj ■ • •], 

Figure [3] shows both our previous SMC-EM estimates (vs 
number of iterations) in Figure |2] and the SMC online EM 
estimates (vs number of passes over the original data yi : „) on 
the concatenated data; and we note that both algorithms are 



started with the same initial estimate of 9*. Noting that the 
computational cost of one iteration of the SMC-EM algorithm 
and the computational cost of one pass of SMC online EM 
algorithm over the data are roughly the same, we observe that 
cr xv and the other parameters converge much quicker in this 
way. The caveat though is that there is now a bias introduced 
due to the discontinuity at the concatenation points, e.g. y„ 
may correspond to the observations of many surviving targets 
whereas yi may be the observations of an initially target free 
surveillance region. This discontinuity will effect, especially, 
survival p s , detection pd, and any other parameter depending 
crucially on a correct Kf estimate over time. However it will 
have little effect on the parameters fi bx , ^i by ,a bp ,al v ,al v ,a y 
which govern the dynamics of the HMM associated with a 
target. In conclusion, one way to estimate 9* in a batch setting 
using SMC-EM is by (i) first running SMC online EM on 
[yi:„, yim, . . .] until convergence to get an estimator 9' of 9*, 
(ii) and then run the batch SMC-EM initialised at 9'. 

2) Batch estimation on a larger data set: In the sec- 
ond experiment we compare the batch estimation algorithms, 
MCMC-EM and the Bayesian method, with a larger data set 
which has more targets and observations. Recall that the SMC- 
EM algorithm is based on a SMC algorithm which uses the 
i-best linear assignments and its computational complexity is 
approximately polynomial of order 3 in X y = X x + (1 +Pd)Xf. 
Therefore, the SMC-EM algorithm would take a long time to 
execute and is left out of the comparison in this experiment. 
We created a data set of n = 150 time steps by using the 
parameter 

9* = (0.65, 22.5, 0.90, 0.95, 0, 0, 25, 4, 0, 0.0625, 4). 

with window size n = 150 for the surveillance region. With 
this choice, we see approximately 13 targets per time. Figure 
E] shows the results obtained from the MCMC-EM and the 
Bayesian method for estimating 9* . When the true association 
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Number of iterations (passes over data) 0.25 
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Number of iterations (passes over data) 



- batch 
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500 1000 1500 2000 500 1000 1500 2000 
Number of iterations (passes over data) Number of iterations (passes over data} 



Fig. 3. Comparison of online SMC-EM estimates applied to the concatenated data (thicker line) with batch SMC-EM 



is given, the EM algorithm finds 0*' z for this data set as 
6*- z = (0.63,22.88,0.90,0.95,0.15,-0.68,27.96,3.32, 

0,0.065,3.98). 

We can see that both methods work well for this large data 
set. It is worth mentioning that MCMC Bayesian converged 
to the stationary distribution after le5 iterations (not shown in 
the figure), while MCMC-EM converged after 3e5 iterations. 

B. Online EM setting 

We demonstrate the performance of the SMC online EM in 
Algorithm [3] in two settings. 

1 ) Unknown fixed number of targets: In the first experiment 
for online estimation, we create a scenario where there are a 
constant but unknown number of targets that never die and 
travel in the surveillance region for a long time. That is, K§ = 
K (which is unknown and to be estimated), Ab = and p s 
1 . We also slightly modify our MTT model so that the target 
state is a stationary process. The modified model assumes that 
the state transition matrix F is 



path of the 1 st target for 1 000 time steps 



F = 



0.99/ 2x2 

02x2 



A/2X2 
0.99/2X2 



(27) 



and G, W and V are the same as the MTT model in Example 
Q] The change is to the diagonals of matrix F which should 
be ^2x2 for a constant velocity model. However, 0.99/2x2 
will lead to non-divergent targets, i.e. having a stationary 
distribution; see Figure for a sample trajectory. We create 
data of length n — 50000 with K = 10 targets which are 
initiated by using fi bx = 0, [i hy = 0,a% x = 25, a\ v = 4. 
The other parameters to create the data are pd = 0.9, Ay = 
10, a 2 = 0.01, o~f — 4, and the window size k = 100. 



X,(2) 



X,(1) 



Fig. 5. The position of target no. 1 evolving in time for the first 1000 time 
steps with modified constant velocity model with F in )27t 



Figure [6] shows the estimates for parameters pd,Xf, a xv ,ay 
using the SMC online EM algorithm described in Algorithm 
|3] when K® = K = 10 is known. We used L = 10 and 
N = 100, and 74 = t~°- s is taken for all of the parameters 
except a xv , where we used 74 = i~ - 55 . The burn-in time, 
until when the M-step is not executed, is tb — 10. We can 
observe the estimates for the parameters quickly settle around 
the true values. Note that fj, x , ^, y ,a 2 p ,a 2 v are not estimated 
here because they are the parameters of the initial distribution 
of targets which have no effect on the stationary distribution of 
a MTT model with fixed number of targets, and thus they are 
not identifiable by an online EM algorithm 1 1 Oil - Note that the 
online MLE procedure is based on the fact that the parameters 
of the initial distribution will have a negligible effect on 
the likelihood of observations y* for large t. In practice, 
the parameters of the initial distribution can be estimated by 
running a batch EM algorithm for the sequence of the first few 
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1 2 3 4 5 

Number of M CMC-EM iterations (xlO 1 



Fig. 4. Batch estimates obtained from a large data set using the MCMC-EM (bold lines) algorithm for MLE and MCMC for Bayesian estimates (histograms). 
9*' z is shown as a cross. Upper and lower x-axes show the number of EM iterations for MCMC-EM and SMC-EM, respectively. 



observations, such as yi : 5o, and fixing all other parameters to 
the values obtained by SMC online EM. 






* * * 


t = 200 


s t - 300 _ 


" t = 400 






^ t - 500 



9 10 11 12 13 

Number of targets K 




1 2 3 4 5 

num. of samp, (t) x 1Q 4 



2 3 4 5 
num. of samp, ft) x 1 Q 4 



Fig. 7. Left: estimates of pe 1 . t (yi:t\K) (normalised by t) for values 
t = 100 . . . , t = 500 and for it = 6, . . . , K = 15. Right: Estimates of 
PBi-t (yi:t 1-^") normalised by t for values K = 6, . . . , K = 15, K = 10 is 
stressed with a bold plot. 



Fig. 6. Online estimates of SMC-EM algorithm (Algorithm |5J for fixed 
number of targets. True values are indicated with a horizontal line. Initial 
estimates for p^, ■, cr^. v , a'y are 0.6,15,0.25,25; they are not shown in 
order to zoom in around the converged values. 



The particle filter in Algorithm [3] which we used to 
produce the results in Figure [3] has all its particles having 
the same number of targets, which is the true K. However, 
K can be estimated by running several SMC online EM 
algorithms with different possible K's, and comparing the 
estimated likelihoods pg 1 . t (yi :t \K) versus t. Figure [7] shows 
how the estimates of pe 1 . t (yi :t \K) for values K — 6, ... ,15 
compare with time. Both the left and right figures suggest that 
pg 1 . t (yi :t \K) favours K = 10 starting from t = 100 and the 
decision on the number of targets can be safely made after 
about 200 time steps. We have also checked this comparison 
with different initial values for 9 and found out that the 
comparison is robust to the initial estimate 9q. 



2) Unknown time varying number of targets: In the second 
experiment with online estimation, we consider the constant 
velocity model in Example Q] with a time-varying number of 
targets, i.e. Af, > and p s < 1. We generated a set of data of 
length n — 10 5 using parameters 

6* = (0.2, 10, 0.90, 0.95, 0, 0, 25, 4, 0, 0.0625, 4) 

and we estimated all of them (except a^ p — 0). Again, we 
used L = 10 and N = 200, and j t = t~ - 8 is taken 
for all of the parameters except a^. v for which we used 
jt — t~ - 55 . The online estimates for those parameters are 
given in Figure [8] (solid lines). The initial values are taken to 
be O = (0.8,0.5,0.6,13,-1,-1,1,1,16,0,0.25,25) which 
is not shown in the figure in order to zoom in around 9*. 
We observe that the estimates have quickly left their initial 
values and settle around 9*. Also, the parameter estimates 
for the initial distribution of newborn targets have the largest 
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Fig. 8. Estimates of online SMC-EM algorithm (Algorithm |5J for a time varying number of targets, compared with online EM estimates when the true data 
association ({Zt}t>i) is known (black dashed lines) and SMC online EM estimates when the birth death information ({K% , C"}t>i) is known (red dashed 
lines). For the estimates in case of known true association and in case of known birth-death information, 61000,2000,. ..,100000 ws shown only. True values 
are indicated with a horizontal line. The initial value 6*o = (0.8, 0.5, 0.6, 13, —1, — 1, 1, 1, 16, 0, 0.25, 25) is not shown in order to zoom in around 9* 



oscillations around their true values which is in agreement 
with the results in the batch setting. 

Another important observation from Figure [8] is that there 
is bias in the estimates of some of the parameters, namely 
Pd, Xf,a^ v ,(j^. v ,ay. This bias arises from the Monte Carlo 
approximation. To provide a clearer illustration of this Monte 
Carlo bias, we compared the SMC online EM estimates with 
the online EM estimates we would have if we were given the 
true data association, i.e. {Z t } t >i- The dashed lines in Figure 
|8]show the results obtained when the true association is known; 
for illustrative purposes we plot every 1000'th estimate only, 
hence the sequence 6*1000,2000,.. .,100000- 

The source of the bias in the results is undoubtedly due 
to the SMC approximation of p$(zi: n \yi :n ). However, we are 
able to pin down more precisely which components of Zi :n are 
being poorly tracked. We ran the SMC online EM algorithm 
for the same data sequence, but this time by feeding the 
algorithm with the birth-death information, i.e. {K^,C'}t>x- 
Figure [8] shows that when {K^, C t s }i>i is provided to the 
algorithm, the bias for some components drops. This indicates 
that (i) the bias in the MTT parameters is predominantly due 
to the poor tracking of the birth and death times by our SMC 
MTT algorithm and (ii) with knowledge of the births and 
deaths, the unknown assignments of targets to observations 
seem to be adequately resolved by the L-best approach since 
the bias in the target HMM parameters diminishes. Therefore, 
the bottle neck of the SMC MTT algorithm is birth/death 
estimation and, generally speaking, a better SMC scheme for 
the birth-death tracking may reduce the bias. Note that when 



the number of births per time is limited by a finite integer, all 
the variables of Z t i.e. {K\,K ! t \C s t ,Cf,A t ) can be tracked 
within the L-best assignment framework, and we expect in 
this case the bias to be significantly smaller. However, since 
in our MTT model the number of births per time is unlimited 
(being a Poisson random variable), we cannot include birth- 
death tracking in the L-best assignment framework; see the 
SMC algorithm in Appendix IE1 for details. 

3) Tuning the number of particles N: It is expected that 
a reasonable accuracy of SMC target tracker is necessary 
for good performance in parameter estimation. Obviously, 
there is a trade off between accuracy of SMC tracking and 
computational cost, and this trade off is a function of N, the 
number of particles. This raises the following question: how 
do we identify if the number of particles is adequate for the 
SMC online EM algorithm for a real data set given that 9* is 
unknown? We propose a procedure to address this issue. For 
the chosen value N: 

1) Run SMC online EM on the real data set with N 
particles to obtain an estimate 9 of the unknown 9*. 

2) Simulate the MTT model with 9 for a small number of 
time steps to obtain a data set for verification. 

3) Run the SMC target tracker for the simulated data with 
9 = 9 known. 

4) If the target tracking accuracy is "bad", increase N and 
return to step 1; else stop. 

The tracking accuracy can roughly be measured by comparing 
Kf with its particle estimate which is suggestive of the birth- 
death tracking performance, which we have identified to have 
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a significant impact on the bias of the estimates as shown in 
Figure [8] 

V. Conclusion and Discussion 

We have presented MLE algorithms for inferring the static 
parameters in linear Gaussian MTT models. Based on our 
comparisons of the offline and online EM implementations, 
our recommendations to the practitioner are: (i) If batch 
estimation permissible for the application then it should always 
be preferred, (ii) Moreover, MCMC-EM should be preferred 
as batch SMC-EM has the disadvantage of slow conver- 
gence of some parameters while online SMC-EM applied 
to concatenated data, although converges quicker then batch 
MCMC-EM, induces some bias for certain parameters due 
to the discontinuity caused at the concatenation boundaries. 
Furthermore, SMC tracker does not scale well with the average 
number of targets per time and clutter rate; see Sec calculation 
in IIV-AI (iii) For very long data sets (i.e. large time) and 
when there is a computational budget, then online SMC-EM 
seems the most appropriate since the it is easier to control 
computational demands by restricting the number of particles. 
We have seen that in online SMC-EM there will be biases in 
some of the parameter estimates if the birth and death times are 
not tracked accurately. The particle number should be verified 
for adequacy as recommended in Section IIV-B3I 

We have not considered other tracking algorithms that work 
well such as those based on the PHD filter lf3oL 32 1 which 



could be used provided track estimates can be extracted. 
The linear Gaussian MTT model can be extended in the 
following manner while still admitting an EM implementation 
of MLE. For example, split-merge scenarios for targets can be 
considered. Moreover, the number of newborn targets per time 
and false measurements need not be Poisson random variables; 
for example the model may allow no births or at most one 
birth at a time determined by a Bernoulli random variable. 
Furthermore, false measurements need not be uniform, e.g. 
their distribution may be a Gaussian (or a Gaussian mixture) 
distribution. Also, we assumed that targets are born close to 
the centre of the surveillance region; however, different types 
of initiation for targets may be preferable in some applications. 

For non-linear non-Gaussian MTT models, Monte Carlo 
type batch and online EM algorithms may still be applied 
by sampling from the hidden states X t 's provided that the 
sufficient statistics for the EM are available in the required 
additive form DSQ . In those MTT models where sufficient 
statistics for EM are not available, other methods such as 
gradient based MLE methods can be useful (e.g. Poyiadjis 
et al. fH). 



Appendix 

A. Recursive updates for sufficient statistics in a single 
GLSSM 



Referring to the variables in Section llH-Bll the intermediate 
functions for the sufficient statistics in ( TT~8T > can be written as 

T m ,t,ij(xt, c l:t) = x t Pm,t,ijXt + Q m ,t,ij X t + ^m,t,ij 



where i, j — 1, . . . , d x for m — 1, 3,4, 5, 7; i = 1, . . . , d x ,j = 
1, . . . , d y for m = 2; and i = 1, . . . , d x , j = 1 for in = 6. All 
P m ,t,ij's, q m ,t,i/s and f m ,t,y's are d x x d x matrices, 4x1 
vectors and scalars, respectively. Forward smoothing is then 
performed via recursions over these variables. Start at time 1 
with the initial conditions P m ,i,ij = Od x xd x , q~m,i,ij = Od x xi, 
and f m ,i t ij — for all m except -Pi.i.jj = cfe^ej, Pr,i,ij = 
e^ej, q 2 ,i,ij = ctvi{j)<Zi, and q 6 ,i,ii = e 4 . At time t + 1, 
update 



Pi,t+i,ij — Bj P~l,t,ijB t 



C t+l e i e j 



qi,t+i,ij = Bfqi,t,ij + B J (Pi,t,v + P iJ,ij) b t 



p 



2,t+l,ij 



Jd-r xd x 



tr (P 1)t ,i J -E t | t+1 ) + qlt^bt + bJP 1:t ,ijb t 



q~2,t+i,ij = Bl q~2,t,ij + cf +1 y t+ i(j)ei 

f2,t+i,%j = f 2 ,t,ij + ql, t +i,ijh 
P 



3,t+l,ij 



B T t (P s , t ,ij + e t ej) B t 



93,t+i,y — B^q 3 j,ij + B t (Pa,t,ij + P 
r3,t+i,ij = f3,t,ij + tr ((P3,t,ij + e»ej) %t\t+i) + Q~3,t,tj b t 



+ bj (P 3 ,t,ij + e,ej) b t 



Pi,t+x,ij — B t PA,t,ijB t + eie^ 
q~4,t+i,ij = Bjq^ t ,ij + Bj {Pi,t,ij + P±,t,ij) bt 



f*4,t+l, 



•trLR 



Ps,t+i,ij — Bj ' P$,t,ijBt + eiej B t 



4,t,ij^t\t+l 

j 



) +qJ,t,ijbt + bJPA,t,ijb t 



?5,t+i,y = B tq 5 , t ,ij + Bj (P 5 ,t,ij + P<f,t,ij) bt + ejbla 
rs,t+i,ij = rs,t,ij + tr (P 5 , t ,ij^t\t+i) + ql,t,ij b t + bfP 5 ,t,ijb t 

P&,t+l,il = Od x xd x 



q~s,t+i,n = B t qe,t,n 
^6,t+i,ii = re.f.ii + qjj+ijibt 

P\,t+l,ij — Bj (P7,t,ij) B t 

97,i+i,w = Bjq 7it ,ij + Bj (Pr,t,ij + P^t.ij) bt 

rr,t+i,ij = fr,t,ij + tr (P7,t,ij^t\t+i) + Q~7,t,ij b t + bJP 7 ,t,ijbt 

For the online EM algorithm, we simply modify the update 
rules by multiplying the terms on the right hand side contain- 
ing e t or /d x xd x by j t +i and multiplying the rest of the terms 
by (1 -7t+i). 

B. SMC algorithm for MTT 

An SMC algorithm is mainly characterised by its proposal 
distribution. Hence, in this section we present the proposal 
distribution qe{zt\zi-t-i,yi:t), where we exclude the super- 
scripts for particle numbers from the notation for simplicity. 
Assume that Zx-.t-i is the ancestor of the particle of interest 
with weight Wt-i- We sample z t = (k% , cf , cf , k{ , a±) and 
calculate its weight by performing the following steps: 

• Birth-death move: Sample k\ ~ VO(-;Xb) and c|(j) ~ 
B£(-; Ps ) for j — 1 , . . . , fcf_ x . Set k s t = Y*ti <4 and 
construct the kf x 1 vector if from cf . Set kf — kf + k\ 
and calculate the prediction moments for the state. For 
j = l,...,kf, 
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- if j < k s t , set n t \ t _ 



Ffj, 



^>t\t-i,j — F^t-i\t-i,i s t {])F T + W. 



t-i|t-i,if(i) 



and 



- if j > k s t , set Ht\t—i,j = and E>t\t-i,j = E&. 
Also, calculate the moments of the conditional observa- 
tion likelihood: For j — 1, . . . , kf, /j% ■ = Gfj, t \t-i j an d 
E",. = GZ tlt _ ltj G T + V. 

Detection and association Define the kf x (kf + kf) 
matrix D t as 



'logfc^^^E^.)) if J <kl 
D l U.j) = { log ifi = j-k y t , 
-oo otherwise. 



and an assignment is a one-to-one mapping at : 
{1, . . . , kf} -> {1, . . . , kf + kf}. The cost of the as- 
signment, up to an identical additive constant for each 

a t is 

d{D u a t ) = Y J D t {j,a t {j)). 

i=l 

Find the set Al = {ck,i, ■ ■ ■ > a t,L} of L assignments 
producing the highest assignment scores. The set Al can 
be found using the Murty's assignment ranking algorithm 
[181. Finally, sample a t — a t j with probability 



exp[rf(A,Qtj)] 
E 7 *'=i exp[d( D t , a t j>)] 



J = 1, 



• L 



Given at, one can infer cf (hence if), kf, kf and the 
association a t as follows: 



4{k) = 



1 if a t (k) < kf, 
if a t (k) > k v t . 



Then kf — c f(k), kf = kf — kf, if is constructed 

from cf, and finally 

ot(fc) = a t (if(k)), k = l,...,kf. 

• Reweighting: After we sample z t = (jtf , cf , cf , k{ , a t ^j 

from <7e(^t|-Zi:t-i) y*)> we calculate the weight of the 
particle as in (fl4l i. which becomes for this sampling 
scheme as 

L 

w t ocwt-iXf ' exp[d(Dt, &t,j)]- 
i=l 

C. Computational complexity of SMC based EM algorithms 

1) Computational complexity of SMC filtering: For simplic- 
ity, assume the true parameter value is 9. The computational 
cost of SMC filtering with 9 and N particles, at time t, is 



C SM c(0,t,N) 



cxN - 

resampling 



N 

E 



C3 



birth-death sampling 



dl ( Ci Kf + c 5 KfKf) + c 6 L (Kl 



(i) 



moments and assignments 



where c\ to C6 are constants and c 3 is for sampling from the 
Poisson distribution. If we assume that SMC tracks the number 
of births and deaths well on average then we can simplify the 
term above 



C SM c(9, t, N) « N [ci, 3 + c 2 Kf_ x 



c e L (Kf 



Kf) 3 



where ci,3 = c\ + C3. The process {-ftff } t >i is Markov and its 
stationary distribution is V(\ x ) where \ x — ^ . Also Kf = 
Kf + K{ and for simplicity we write Kf s» paKf. Therefore 
the stationary distribution for {Kf + Kf} t >i is approximately 
that of {(1 + p d )Kf + K{} t >i which is V(X y ) where X y = 
X x (l + Pd) + A/. Therefore, assuming stationarity at time i 
and substituting E-p(\) (^ 3 ) = A 3 + 3A 2 + A, the expected cost 
will be 



[C SMC (9,t,N)}^N 



ci, 3 + (c 2 + d 3 [c 4 + c 5 (p d + A/)]) A 
+ c s p d A^ + c 6 L (A 3 + 3Xl + X v 



2) SMC-EM for the batch setting: The SMC-EM algorithm 
for the batch setting first runs the SMC filter, stores all its 
path trajectories i.e. {^i?l}i<j<jv an d then calculates the 
estimates of required sufficient statistics for each Z\. n by using 
a forward filtering backward smoothing (FFBS) technique, 
which is bit quicker then forward smoothing. Therefore, the 
overall expected cost of batch SMC-EM applied to data of 



SMC-EM 



C FFBS (9 1 n,N) + YCi 



t=l 



SMC { 



,t,N) + c 7 



Murty (worst case) 



where C7 is the cost of the M-step, i.e. A. Let us denote the 
total number of targets up to time n is M and let L\, . . . , Lm 
be their life lengths. The computational cost of FFBS to 
calculate the smoothed estimates of sufficient statistics for a 
target of life length L is 0(d^,L). Therefore, 

C PPSS (9,n,N)=Y,Y, c * d x L ™- 

i—l m— 1 

Assume the particle filter tracks well and and Lm, m = 
1, . . . , M W for particles i = 1, . . . , N are close enough to 
L m , and M, the true values, for m = 1, . . . , M. Then, we 
have 

N M 

C FFB s(0,n,N) c s d l L m- 

i—l m— 1 

The expected values of L m and M are 1/(1 — p s ), nXb, 
respectively. Also assume stationarity at all times so that the 
expectations of the terms CsMc{0,t,N) are the same and we 
have 

[Cffbs($! n, N)] » c 8 A^d 3 A 6 (l - p,)" 1 . 

As a result, given a data set of n time points, the overall 
expected cost of SMC-EM for the batch setting per iteration 
is 

J E e [Csmc-em] w Ee [C FFBS (9, n, N)\ + n£ e [C S uc(0, t, N)] + c 7 . 
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3) SMC online EM: The overall cost of an SMC online 
EM for a data set of n time points is 

n 

CsMConEM » ^ IPW*' f ' iV ) + C 'sMc(f , t, AQ + C ?l ■ 
4=1 

The forward smoothing recursion and maximisation used in 
the SMC online EM requires 



N 



c FSR (e,t,N) = J2c9Kt4 



calculations at time t for a constant eg, whose expectation is 

[C FSR {6,t,N)} = c*)N\ b (l -p s y l dl. 

at stationarity. The overall expected cost of an SMC online 
EM for a data of n time steps, assuming stationarity, is 

Ee [Csmcoiiem(0, n, N)\ 

« n (E 9 [C7 F sr(0, t, iV)] + Eg [C SM c{0, t, N)] + c 7 ) . 
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